# COCO Fine-tuning

My Model
MIT
GIT is a Transformer-based image-to-text generation model capable of generating descriptive text from input images.
Image-to-Text PyTorch Supports Multiple Languages
M
anoushhka
87
0
Git Large Coco
MIT
GIT is a Transformer-based image-to-text generation model capable of generating descriptive text from input images.
Image-to-Text Transformers Supports Multiple Languages
G
alexgk
25
0
Detr Resnet 50 Base Coco
Apache-2.0
An object detection model fine-tuned on the COCO dataset based on facebook/detr-resnet-50
Object Detection Transformers
D
amyeroberts
20
1
Vit Swin Base 224 Gpt2 Image Captioning
MIT
An image caption generation model based on the VisionEncoderDecoder architecture, using Swin Transformer as the visual encoder and GPT-2 as the decoder, fine-tuned on the COCO2014 dataset
Image-to-Text Transformers English
V
Abdou
321
2
Git Large R Coco
MIT
GIT is a Transformer-based generative image-to-text model capable of generating descriptive text from images.
Image-to-Text Transformers Supports Multiple Languages
G
microsoft
86
10
Git Large Coco
MIT
GIT is a Transformer decoder-based vision-language model capable of generating image captions and performing visual question answering
Image-to-Text Transformers Supports Multiple Languages
G
microsoft
6,582
103
Git Base Coco
MIT
GIT is a Transformer decoder based on CLIP image tokens and text tokens, used for tasks such as image caption generation and visual question answering.
Image-to-Text Transformers Supports Multiple Languages
G
microsoft
5,461
19
Mengzi Oscar Base Retrieval
Apache-2.0
A Chinese image-text retrieval model fine-tuned on the COCO-ir dataset based on the Chinese multimodal pretraining model Mengzi-Oscar
Text-to-Image Transformers Chinese
M
Langboat
17
3
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase